From Dependencies to Constituents in the Reference Corpus for the Processing of Basque (EPEC)

نویسندگان

  • Arantza Díaz de Ilarraza
  • Enrique Fernández-Terrones
  • Izaskun Aldezabal
  • María Jesús Aranzabe
چکیده

In this paper the process for turning a dependency-based corpus to a constituentbased one is explained. For this purpose, first both the Dependency and the Constituent formalism are analized and then the corresponding equivalences of linguistic phenomena are treated. This process has had different phases in which the linguistic equivalences have been improved. Finally, the evaluation process is briefly explained and, as a result, we get corpora annotated in the two different formalisms usually proposed for syntactic tagging. If the linguistic equivalences are the same, the conversion process could be expanded to other corpus; otherwise, new equivalences should be defined.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque

The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntactic annotation process. In this article we prese...

متن کامل

Building the Gold Standard for the Surface Syntax of Basque

In this paper, we present the process in the construction of SF-EPEC, a 300,000-word corpus syntactically annotated that aims to be a Gold Standard for the surface syntactic processing of Basque. First, the tagset designed for this purpose is described; being Basque an agglutinative language, sometimes complex syntactic tags were needed. We also account for the different phases in the construct...

متن کامل

First approach toward Semantic Role Labeling for Basque

In this paper, we present the first Semantic Role Labeling system developed for Basque. The system is implemented using machine learning techniques and trained with the Reference Corpus for the Processing of Basque (EPEC). In our experiments the classifier that offers the best results is based on Support Vector Machines. Our system achieves 84.30 F1 score in identifying the PropBank semantic ro...

متن کامل

Online Processing of English Wh-Dependencies by Iranian EFL Learners

To be able to reach the level of ultimate attainment in the second language, learners need to acquire not only the grammar of the L2 but also the language processing mechanisms involved in the comprehension of sentences in real time. Contrary to its importance, very little is known yet about online L2 processing. This study examines whether advanced Iranian learners of English reactivate disloc...

متن کامل

Coreferential Relations in Basque: The Annotation Process.

In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2008